home *** CD-ROM | disk | FTP | other *** search
- First, a few of warnings. It is best to run MEMSPD without any TSRs or
- memory managers loaded. If the 386 option is specified, MEMSPD will use
- 32-bit instructions, which will cause unpredictable effects if the CPU is
- not a 386 or 486. The 386 option will also cause MEMSPD to attempt to go
- into protected mode to test extended memory if it thinks there is any.
- Most memory managers hide the extended memory from MEMSPD, but if they don't
- the attempt will fail with unknown results.
-
- The purpose of MEMSPD is to give an indication of the relative speed of
- different areas of memory in a PC, to compare memory speeds of different
- computers, and to learn interesting facts about how memory and the CPU
- cache work. It does this by timing a repeated string instruction (LODS)
- that copies memory to a register. By making the repeat count be large
- (0x800) the time to execute the instructions to read the timer and set up
- the instruction becomes negligible and the result is very close to the
- actual time to execute each memory access.
-
- MEMSPD uses the 8-bit and 16-bit forms of the LODS instruction to access
- memory (and the 32-bit form if -386 is specified). It also does two
- accesses for each form - one in which it attempts to have the CPU cache
- filled with data other than what it is reading (all cache misses), and one
- in which it attempts to have the data already in the cache (all cache hits)
- For each access, the current time is read from timer 0, 0x800 accesses are
- made, the time is read again and the difference in microseconds is printed.
- If the difference between the cache-miss and cache-hit times is less than
- .04 it is assumed that the difference is due to experimental error and is
- not printed.
-
- For best results, MEMSPD should not be run under any sort of memory
- manager, as they can hide extended memory or disturb the timings. When
- the 386 option is specified, MEMSPD does not try to check to see if the
- CPU actually is a 386 or greater - unpredictable results will occur if this
- option is used with a 286 or below. MEMSPD assumes that Timer 0 is being
- run in Mode 3 (the default for most BIOSs), but it is possible that a TSR
- or BIOS might run it in a different mode. I would suggest doing a sanity
- check, such as is described below, to make sure your numbers make sense.
- On very slow machines (e.g. 8Mhz or less) the time it takes to do one
- repeated LODS may exceed the resolution of the timer giving wildly
- inaccurate numbers.
-
- My Everex AGI 386-25 gives the following results:
-
- 00000 - 9FFFF byte 0.24-0.33, word 0.24-0.42, dwrd 0.24-0.59
- A0000 - B7FFF byte 0.76, word 0.76, dwrd 1.37
- B8000 - BBFFF byte 2.09, word 1.78, dwrd 3.75
- BC000 - BFFFF byte 2.05-1.92, word 2.17, dwrd 3.75
- C0000 - C7FFF byte 0.76, word 0.76, dwrd 1.37
- C8000 - DFFFF byte 1.18, word 2.19, dwrd 4.24
- E0000 - EFFFF byte 0.76, word 0.76, dwrd 1.37
- F0000 - FFFFF byte 0.24-0.33, word 0.24-0.42, dwrd 0.24-0.59
- 100000 - 43FFFF byte 0.24-0.33, word 0.24-0.42, dwrd 0.24-0.59
-
- The following can be inferred from these numbers.
-
- 1. The memory from 0 - 9FFFF and F0000 and above is cached, since there are
- two sets of numbers for each test. Also, all of this memory is 32-bit
- memory, since the access times for byte, word and dword are all the
- same.
-
- 2. The memory from A0000 - B7FFF, which is VGA video memory, is 16-bit
- since the byte and word times are the same but the 32-bit time is
- greater (the byte and word operations each take one memory access but
- the dword takes two).
-
- 3. The memory from B8000 - BFFFF is the video memory that is currently
- being used. The times for it vary wildly each time MEMSPD is run,
- probably because the CPU access are often (and randomly) blocked
- because the memory is being accessed by the video card to display
- the screen.
-
- 4. The memory from C0000 - C7FFF is the video card ROM is 16-bit.
-
- 5. The memory from C8000 - DFFFF contains various other ROMS as well as
- areas that are unused. The times here, since they include areas for
- which there is no memory, represent the basic bus byte access times.
-
- 6. The memory from F0000 - FFFFF is the BIOS. The times shown are with
- BIOS shadowing enabled, and are the same as the 32-bit memory. If
- BIOS shadowing is turned off, the times become:
- byte 0.76, word 0.76, dwrd 1.37
-
- Running MEMSPD with the CPU cache disabled gives the following:
-
- 00000 - 9FFFF byte 0.24-0.33, word 0.24-0.42, dwrd 0.24-0.59
- 00000 - 9FFFF byte 0.42, word 0.42, dwrd 0.42
-
- This points out a very interesting attribute of this machine's CPU cache -
- while cache hits are much faster than access without the cache, cache misses
- for word accesses are the same and cache misses for 32-bit accesses are much
- more expensive.
-
- In order to determine if these numbers are reasonably accurate, I did the
- following computations:
-
- 1. A 25 MHz machine has a clock cycle time of .04 microseconds
- (1/25,000,000). The Intel 386 book shows that LODS takes 6
- clocks per repeated operation, and in fact 6 * .04 is the .24
- that MEMSPD reports for LODS with no wait states (cache hit).
-
- 2. According to the EISA spec (which is the closest thing I have to any
- sort of documentation for the ISA bus), it takes 7 bus clocks to
- access an 8-bit ISA slave. An 8 MHz bus has .125 usec per cycle
- (1/8,000,000), for a total of .875 usec. Adding the .24 usec it takes
- to access memory via LODS gives 1.115, which is very close to the
- value of 1.18 gotten for the memory from C8000-DFFFF.
-
- The IOB and IOW options test the access times for I/O accesses for byte and
- word accesses. If IOW specifies a port address of either 1F0 or 170, and if
- MEMSPD thinks there is a disk controller at that address, it will start a
- read of the first sector and wait for DRQ to be asserted before doing the
- test. I have discovered that some WD1003 compatible controllers will do
- 16-bit I/O cycles when the data register is read only if there is data to
- transfer, while other always assert it. Doing the test only when the
- controller is transferring data insures that 16-bit transfers will be done.
- IOW with a port address of 170 or 1F0 should not be done if there is any
- kind of disk cache or other TSR loaded that does disk I/O, or corruption
- of disk data could occur. Likewise, IOB and IOW should not be given the
- port address of an existing device if there is a TSR loaded that is using
- the device.
-
- I have included all of the source files. The assembler I used is a
- non-standard one that we use at work. The C was compiled with a Metaware
- compiler. MEMSPD was written very quickly and is neither pretty nor
- is the source documented very well.
-
- Jack Jackson. 70152,3713
- 3/31/93